Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Text keyword extraction method based on word frequency statistics

LUO Yan, ZHAO Shuliang, LI Xiaochao, HAN Yuhui, DING Yafei

Journal of Computer Applications 2016, 36 (3): 718-725. DOI: 10.11772/j.issn.1001-9081.2016.03.718

Abstract （1277）

PDF （1022KB）（962）

Save

Focused on low efficiency and poor accuracy of the traditional TF-IDF (Term Frequency-Inverse Document Frequency) algorithm in keyword extraction, a text keyword extraction method based on word frequency statistics was proposed. Firstly, the formula of the same frequency words in text was deduced according to Zipf's law; secondly, the proportion of each frequency word in text was determined in accordance with the formula of the same frequency words, most of which were low-frequency words; finally, the TF-IDF algorithm based on word frequency statistics was proposed by applying the word frequency statistics law to keyword extraction. Simulation experiments were conducted on Chinese and English text experiment data sets. The average relative error of the formula of the same frequency words was not more than 0.05; the maximum absolute error of the proportion of each frequency word in text was 0.04. Compared with the traditional TF-IDF algorithm, the average precision, the average recall and the average F1-measure of the TF-IDF algorithm based on word frequency statistics were increased respectively, while the average runtime was decreased. The simulation results show that in text keyword extraction, the TF-IDF algorithm based on word frequency statistics is superior to the traditional TF-IDF algorithm in precision, recall and F1-measure, and it can effectively reduce the runtime in keyword extraction.

Reference | Related Articles | Metrics